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Abstract. I give an introduction to algorithmic uses of the principle of 
inclusion-exclusion. The presentation is intended to be be concrete and 
accessible, at the expense of generality and comprehensiveness. 



1 The principle of inclusion exclusion. There are as 
many odd-sized as even-sized subsets sandwiched be- 
tween two different sets: For R C T, 
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(-i)insi = [r = t] 



(1) 




We use Iverson notation [P] for proposition P, mean- 
ing [P] = 1 if P and [P] — otherwise. 

Proof of ([l}. If R = T then there is exactly one sandwiched set, namely S = T. 
Otherwise we set up a bijection between the odd- and even-sized subsets as 
follows. Fix t £ T\R. For every odd-sized subset Si with R C Si C T let 
So = Si © {t} denote the symmetric difference of Si with {£}. Note that the size 
of So is even and that So contains R. Furthermore, Si can be recovered from So 
as Si = S © {*}. □ 

Perspective. We will see the (perhaps more familiar) formulation of the principle 
of inclusion-exclusion in terms of intersecting sets in ^1 and another equivalent 
formulation in ^TT] 



2 Graph colouring. A k- colouring of a graph G = (N, E) on 
n = \N\ nodes assigns one of k colours to every node such that 
neighbouring nodes have different colours. In any such colour- 
ing, the nodes of the same colour form a nonempty independent 
set, a set of nodes none of which are neighbours. 

Let g(S) denote the number of nonempty independent sub- 
sets in S C N. Then G can be /c-coloured if and only if 
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Proof. For every S C N, the term g(S) k counts the number of ways to pick k 
nonempty independent sets Ii, . . . , Ik in S. Thus, we can express the left hand 
side of © as 

S h I k Ii I k s 

The innermost sum has the form 

E 

IiU— UTfcCSCJV 

By (JTJ, the only contributions come from Ii U • • • U 4 — N. Every such choice 
indeed corresponds to a valid colouring: For i = 1, . . . , k, let the nodes in Ii have 
colour i. (This may re-colour some nodes.) Conversely, every valid fc-colouring 
corresponds to such a choice. (In fact, the colourings are the disjoint partitions). 

□ 




Fig. 1. The values of g(S) for all 5* for the example graph to the left. Expression 
([2]) evaluates an alternating sum of the cubes of these values, in this case 6 3 — 
(3 3 + 4 3 + 4 3 + 5 3 ) + (2 3 + 2 3 + 2 3 + 2 3 + 3 3 + 3 3 ) - (l 3 + l 3 + l 3 + l 3 ) + = 18. 



3 Counting the number of independent sets. Expression ((2]) can be evaluated 
in two ways: 

For each S C N, the value g(S) can be computed in time 0(2l s l|i?|) by 
constructing every nonempty subset of S and testing it for independence. Thus, 
the total running time for evaluating @ is within a polynomial factor of 

E2' S '=E("V = 3«. 

SCN »=1 V 7 



The space requirement is polynomial. 



Alternatively, we first build a table with 2 n entries containing g(S) for all 
S C N, after which we can evaluate ^ in time and space 2 n n°^ 1 \ 

Such a table is easy to build given a recurrence for g(S). We have g(0) = 0, 
and 

g(S)=g(S\{v})+g(S\N[v])+l (v e S) , (3) 

where JV[u] = {v} U{ti£iV: ra e £} denotes the closed neighbourhood of v. 

Proof of (J3J). Fix v E S and consider the nonempty independent sets ICS. 
They can be partitioned into two classes: either v G I or v £ I. The latter sets 
are counted in g[S \ {«}) ■ It remains to argue that the sets / 9 v are counted in 
g(S \ N[v]) +1. We will do this by counting the equipotent family of sets / \ {v} 
instead. Since / contains v and is independent, it cannot contain other nodes in 
N[v]. Thus / \ {v} is disjoint from N[v] and contained in 5*. Now, either / is the 
singleton {v} itself, accounted for by the '+1' term, or / \ {v} is a nonempty 
independent set and therefore counted in g(S \ N[v}) . □ 




Fig. 2. Three stages in the tabulation of g(S) for all S C N bottom-up. For 
example, the value of <?({A, C, D}) is given by © with v = D as g({A, C}) + 
5 ({C}) + 1 = 4. 



Perspective. The brute force solution for graph colouring tries all k n assignments 
of colours to the nodes, which is slower for k > 4. Another approach is dynamic 
programming over the subsets [15] . based on the idea that G can be fc-coloured if 
and only if G[A r \S'] can be (k— l)-coloured for some nonempty independent set S. 
That algorithm also runs within a polynomial factor of 3™, but uses exponential 
space. In summary, the inclusion-exclusion approach is faster than brute force, 
and uses less space than dynamic programming over the subsets. The insight 
that this idea applies to a wide range of sequencing and packing problems goes 
back to Karp [12] . the application to graph colouring is from [2]. 

We use a space-time trade-off to reducing the exponential running time factor 
from 3™ to 2™, applying dynamic programming to tabulate the decrease-and- 
conquer recurrence , based on [8J. Recurrence © depends heavily on the 
structure of independent sets; a more general approach is shown in STlOl 

The two strategies for computing g(S) represent extreme cases of a space- 
time tradeoff that can be balanced [4] . 



4 Perfect matchings in bipartite graphs. Consider a bipartite graph with bi- 
partition (N,N), where N = {1, ...,n}, and edge set E C N x N. A perfect 
matching is an edge subset MCE that includes every node as an endpoint 
exactly once. See Fig. [3] for some interpretations. 



Fig. 3. Row 1: A bipartite 
graph and its three perfect 
matchings. Row 2: In the 
graph's adjacency matrix 
A, every perfect matching 
corresponds to a permuta- 
tion 7r for which j4,- )3r (j) = 1 
for all i G [n]. Row 3: In 
the directed n-node graph 
defined by A, every perfect 
matching corresponds to 
a directed cycle partition. 
Bottom row: an equiva- 
lent formulation in terms 
of non-attacking rooks on 
a chess board with forbid- 
den positions. 




1 1 
1 1 
1 





X 


1 


1 


1 


1 


1 1 


1 






(T) 


(T) 



The Ryser formula for counting the perfect matchings in such a graph can 
be given as 

n n 

£ T[Mi)£E}= ^(_l)I^ITJ^ [ijejB]) (4 ) 

where 5„ denotes the set of permutations from N to AT. The left hand side 
succinctly describes the problem as iterating over all permutations and checking 
if the corresponding edges (namely, l7r(l), 2ir(2), . . ., nir(n)) are all in E. Direct 
evaluation would require n! iterations. The right hand side provides an equivalent 
expression that can be evaluated in time 0(2 n n 2 ), see Fig.0] 

Proof of ([!]). For fixed i G N, the value G E] counts the number of i's 

neighbours in S C N. Thus the expression 

n 

is the number of ways every node z G N can choose a neighbour from S. (This 
allows some nodes to select the same neighbour.) Consider such a choice as a 
mapping g: N N, not necessarily onto, with image R — g(N). The contribu- 
tion of g to ([5]) is 1 for every S 3 i?, and its total contribution to the right hand 




Fig. 4. Inclusion-exclusion for non-attacking rooks. The top row shows all 12 = 
3-2-2 ways to place exactly one rook in every board line. Every row shows the 
possible placements in the vertical lines given by S C {1,2, 3}. We omit the rows 
whose contribution vanishes, namely S = {1}, S — {3} and S — 0. Of particular 
interest is the second column, which is subtracted twice and later added again. 
The entire calculation is 12-4-2-4+1 + + 0- = 3. 



side of flU is, using (JTJ, 

RCSCN 

Thus g contributes if and only if it is a permutation. □ 

Perspective. Bipartite matching is an example of a sequencing problem, where 
inclusion-exclusion replaces an enumeration over permutations, X^eS by an al- 
ternating enumeration over subsets X)scat( — l)'^ 5 ' of functions with restricted 
range. Typically, this reduces a factor n! in the running time to 2™. One can 
express the idea algebraically like this: 

E [-] = £[*=*] £ [■■■] 

/: N^N R f: N^N 

f(N)=N f(N)=R 



EEt^K-i)'^ 51 E [•••] 



/: N-+N 
f{N)=R 



E(-i) |JVV?l E^ 5 ] E [• 



(6) 



R f: N—>N 

f(N)=R 



= E(- 1 ) Mns| E [■■■]■ 

S f: N^S 

Ryser's formula is normally given in a more general form, for the permanent 
Stt Ili ^iTr(i) °f a matrix, where the entries can be other than just and 1. The 



running time can be improved to 0(2 n n) arithmetic operations by iterating over 
TV in Gray code order. 

Ryser's formula |17| is a very well-known result in combinatorics and appears 
in many textbooks. However, it is easy to achieve running time 0(2 n n) using 
dynamic programming over the subsets, at the expense of space 0(2"). This is 
the standard approach to sequencing problems , and appears as an exercise 
in Knuth |131 pp. 515-516], but usually not in the combinatorics literature. We 
will witness the opposite methodological preferences in SjTJ Inclusion-exclusion- 
based algorithms for the permanent of non-square matrices in semirings are 
described in [S]. 

5 Perfect matchings in general graphs. We turn to graphs that are not nec- 
essarily bipartite. In general, the number of perfect matchings in a graph with 
an even number n of nodes N is 

SCN v ' 7 

where e[S] denotes the number of edges between nodes in S C N. 

Proof. The term (^J) counts the number of ways to select n/2 distinct edges 
with endpoints in S. (The edges are distinct, but may share nodes.) Consider 
such a selection F C E and let R = {J uv£F {u,v} denote the nodes covered by 
the selected edges. The total contribution of F to the right hand side of is 

RCSCN ^ ' ' 

using ([T]). Thus, F contributes 1 if and only if it covers all nodes. Since F contains 
n/2 edges, F it must be a perfect matching. □ 

The running time is within a polynomial factor of 2 n , and the space is poly- 
nomial; see Fig. [5] 

Perspective. Perfect matchings in general graphs is a packing or partitioning 
problem, while the bipartite case was a a sequencing problem and the graph 
colouring example in $2] was a covering problem. (Admittedly, the distinction 
between these things is not very clear.) The application is form [2], which also 
contains another space-time trade-off based on matrix multiplication. 

The point of the large in example in Fig. [5] is to illustrate the intuition that 
inclusion-exclusion is a sieve. We start with a large collection of easy-to-compute 
objects (the top row in Fig. [5]), and let the alternating sum perform a cancellation 
that sifts through the objects and keeps only the interesting ones in the sieve. 



Fig. 5. The perfect matching algorithm for a graph with n — 6 and m = 7. There are = 35 ways to pick 3 edges out of 7, 
shown in the top row. The triangle appears in 7 other terms (4 negative, 3 positive), the two perfect matchings appear only 
once. 



6 Inclusion-exclusion for sets. If two sets A and B have no ele- 
ments in common, then we have the principle of addition: \A\JB\ = 
\A\ + \B\ . In general, the equality does not hold, and all we have 
is|^4l_J£?| < L4| + . Observing that every element of A n B is 
counted exactly twice on the right hand side allows us subtract the error term: 

\AUB\ = \A\ + \B\ - \AnB\, 

often called the principle of inclusion-exclusion. 

Actually, that's just a special case, the formula is elevated 
to a principle by generalising to more sets. For three sets, the 
formula 

\AUBUC\ = \A\ + \B\ + \C\-\AnB\-\AnC\-\BnC\ + \AnBDC\ 

can be verified by staring at a Venn diagram. The right-hand side 
contains all the possible intersections of A, B, and C, with signs depending on 
how many sets intersect. Generalising this leads us to 

\A 1 U--uA n \= E (-l) |S|+1 |fH> ( 8 ) 
where N = {1, . . . , n}. Equivalently, the number of elements not in any A4 is 

i^u-.-ua,! - £(-i) isi in^i- ( 9 ) 

SCN ieS 

with the usual convention that the 'empty' intersection C\ ie0 Ai equals the uni- 
verse from which the sets are taken. 

Proof of ©. We consider the contribution of every element a. 

Let T ~ {i E N : a G Ai } denote the index set of the sets containing a. 
The contribution of a to the left hand side of (jSJ) is [T = 0]. To determine its 
contribution to the right hand side, we observe that a belongs to the intersection 
C\ ieT Ai and all its sub-intersections, so it contributes 1 to all corresponding 
terms. More precisely, the total contribution of a is given by 

J2 = (-1) |T| E (-!) inS| = M) 171 [T = 0] = [T = 0], 

SC.T SCT 

using (J]) with R = 0. □ 

Perspective. Expressions (|5J) and @ are the standard textbook presentation of 
inclusion-exclusion. We derived them from ([1]) with R = 0. Let us show the 
opposite derivation to see that the two formulations are equivalent. 

Let T be a nonempty, finite set and write T = {1, . . . , n}. Consider the family 
of identical subsets Ai — {1} for all i 6 T. Their union and every nonempty 
intersection is {1}. Thus, from ([8]l. 

1= E (-i) |s|+1 -i, 

0#SCT 

which gives ([!]) for R = after subtracting 1 from both sides. 





7 Hamiltonian paths. A walk in a graph is a sequence of neighbouring nodes 
Vi, . . . ,i>k- Such a walk is a path if every node appears at most once, and a path 
is Hamiltonian if it includes every vertex in G. For ease of notation we also 
assume that all Hamiltonian paths start in node v\ = 1. 

Given a graph G on n nodes N let a(X) denote the number of walks of 
length n that start in 1 and 'avoid' the nodes in X C V, i.e., walks of the form 
1 = V\, . . . , v n with Vi ^ X for all 1 < i < n. Then the number of Hamiltonian 
paths in G is a(0). 

Let Ai denote the walks that avoid {i}. Then a(0) = lUieN^I an< ^ a (-^0 = 
I C\iex Thus, from ©, we have 

o(0) = £(-l)Wa(X). 

For every X, the value a(X) can be computed in polynomial time using 
dynamic programming (over the lengths and endpoints, not over the subsets). 
For t E V and k = 1, . . . , n let for a moment a k (X, t) denote the number of walks 
of the form 1 = v\, . . . , Vk = t with Vi £ X . Then we can set a 1 (X, v) — [v = 1] 
and 

a k+1 (X,t) = J2 a k {X,v)[vt e E] . 

vGV 

The total time to compute a(X) = Yltev a (X,t) becomes 0(n 2 \E\), using poly- 
nomial space. It follows that Hamiltonicity in an n-node graph can be decided 
(in fact, counted) in time 0(2 n n 2 \E\) and linear space. 

Perspective. Hamiltonicity is one of the earliest explicitly algorithmic applica- 
tions of inclusion-exclusion. It appears in \V2\ , but implicitly already in [14J, 
where it is described for the traveling salesman problem with bounded integer 
weights. Both these papers have lived in relative obscurity until recently, for 
example the TSP result has been both reproved and called 'open' in a number 
of places. 

Hamiltonicity is also the canonical application of another algorithmic tech- 
nique, dynamic programming over the subsets which yields an algorithm 
with similar time bounds but exponential space. Thus, we can observe a cu- 
rious cultural difference in the default approach to hard sequencing problems: 
Dynamic programming is the well-known solution to Hamiltonicity, while the 
inclusion-exclusion formulation is often overlooked. For the permanent O, the 
situation is reversed. 

8 Steiner tree. For a graph G = (N,E) and a subset {t\, . . . ,tf.} Q N of 
nodes called terminals, a Steiner tree is a tree in G that contains all terminals. 
We want to determine the smallest size of such a tree. 

We consider a related structure that is to a tree what a walk is to a path. A 
willow W consists of a multiset of nodes S(W) from N and a parent function 
p: S(W) — > S(W), such that repeated applications of p end in a specified root 



node r € S(W). The size of W is the number of nodes in S(W), counted with 
repetition. 

Every tree can be turned into a willow, by selecting an arbitrary root and 
orienting the edges towards it, but the converse is not true. However, a minimum 
size willow over a set of nodes is a tree: Assume a node appears twice in W. 
Remove one of its occurrences u £ S(W), not the root node, and change the 
parent p(v) of all v with p{v) — u to p(v) = p{u). The resulting willow is smaller 
than W but spans the same nodes. Finally, when all repeated nodes are removed, 
p defines a tree. 

Thus, it suffices to look for a size-^ willow W that includes all terminals, for 
increasing I — k, ... , n. Set Ai to be the set of willows of size I that avoid ti. 
Then, from ^ , the number of willows of size I that include all terminals is 

£(-l)l*la<(X), 

XCK 

where a (X) = | Hi£X ^ s the number of willows of size I that avoid the 
terminals in X. 

Again, we can use dynamic programming to compute a l (X) for given X. For 
all X C V and u ^ X let a 1 (X, u) denote the number willows of size I that avoid 
X and whose root is u £ X. Then a}{X, u) = 1 and 

fc-i 

a\X,u)= Y, Y. a ^ X ^ ak ~^ X ^- 

uv£E i=l 

Perspective. This application is from [16J. The role of inclusion-exclusion is 
slightly different from the Hamiltonian path construction in the previous sub- 
section, because we have no control over the size of the objects we are sieving 
for. 

There, we sifted through all walks of length n. What was left in the sieve were 
the walks of length n that visit all n nodes. Thus, every node appears exactly 
once, so that sieve contained exactly the desired solutions, i.e., the Hamiltonian 
paths. 

Here, we sift through all willows of size I. What is left in the sieve are the 
willows that visit all terminals. For given I, these are not necessarily trees. In- 
stead, correctness hinges on the fact that we already sifted through willows of 
smaller size. To strain the metaphor, we use increasingly fine sieves until we find 
something. 

9 Long paths. Consider a graph G = (N, E) and integer k < n. We want to 
detect if G has a path of length k. Inspired by the Hamiltonicity construction in 
^7]we look at all walks (w\, . . . , Wk) on k nodes. For expository reasons we again 
stipulate that all walks begin in a fixed node W\ = 1. Write K = {1, . . . , k}. 

For every edge e pick a random value r(e). For every vertex v and integer k G 
K pick a random value r(v, k). With foresight, the values are chosen uniformly 



1 2 3 

@—@—@ r(uv) ■ r(vx) ■ r(u, 1) • r(v, 2) • r(x, 3) 



2 3 1 

@—®—@ r(uv) ■ r(vx) ■ r(u, 2) • r(v, 3) • r(x, 1) 



Fig. 6. Left: The path W = (u,v,x). Middle: The nodes of W labelled with two 
permutations. Right: The terms associated with W and the two permutations. 



at random form a finite field F of characteristic 2 and size at least 2k (k — 1). All 
computation is in this field. For every walk W = (wi, ...,«;&) starting in Wi = 1 
and every function cf>: K — > K define the term 

r(w l w i+1 
^ i=l ' ^i=l ' 

see Fig. [6l 

Consider the sum over all walks W in G and all permutations tt G Sk, 

We will show below that 

Pr( P (G)=0)H' if G contains a fc-path; 
1 = 0, otherwise . 

where the probability is taken over the random choices of r. 

To compute we first recognise a summation over a permutation and 

replace it by an alternating sum over functions with restricted range, as in ([B]): 

rrGSfc VK SCAT 0: K^S W 

For each 5 C K, the value of the two inner sums can be computed efficiently 
using dynamic programming; we omit the details. The total running time is 
within a polynomial (in n) factor of 2 k . 

Proof of (|T2|) . Consider the contribution of every walk W to (jTTJ) . 

First assume that W is non-simple and let tt be a permutation. We will 
construct anther permutation p such that n ^ p but p(W, n) = — p(W, p). Thus, 
summing over all permutations, the contribution of W is even and therefore 
vanishes in F. To construct p, let be the first self-intersection on W, i.e., 
the lexicographically minimal pair with Wi — wj and i < j. Set p equal to tt 
except for p(i) — Tr(j) and p(j) = n(i). 



(11) 



Now assume that W is a path. It is useful to view (fTTj) as a polynomial in 
variables x(e), x(v, k), evaluated at random points x[e) — r(e), x(v, k) — r(v, k). 
For every permutation 7r, the monomial 

( JJ x(wiW i+1 ) j mx( Wi ,K(i))\ 

is unique. To see this, both W and ir can be recovered from p(W, tt) by first 
reconstructing the nodes Wi, . . . , Wk in order, starting at w\ = 1 and following 
the unique incident edge described by the terms x(e), and then reconstructing 
7r from the terms x(wi,ir(i)). Thus, can be viewed as a nonzero polynomial 
of degree k(k — 1) evaluated at m + nk random points from F. By the DeMillo- 
Lipton-Schwarz-Zippel lemma |9|18| . it evaluates to zero with probability less 
than k(k- 1)/\F\ < ±. □ 

Perspective. The construction is implicit in [5] and not optimal. Again, the 
starting point is the same as for Hamiltonicity in SjH to sieve for paths among 
walks, whose contribution is computed by dynamic programming. 

However, instead of counting the number of walks, we define an associated 
family of algebraic objects (namely, a multinomial defined by the walk and a 
permutation) and work with these objects instead. Strictly speaking, we did 
associate algebraic objects to walks even before, but the object was somewhat 
innocent: the integer 1. 

There are two filtering processes at work: The sifting for paths among walks 
is performed by the the cancellation of non-simple, permutation-labelled walks 
in characteristic 2, rather than the by inclusion-exclusion sieve. At the danger 
of overtaxing the sieving metaphor, the permutation-labelling plays the role of 
mercury in gold mining; inclusion-exclusion ensures that the 'mercury' can be 
added and removed in time 2 k instead of the straightforward fc!. 



10 Yates's algorithm. Let / : 2 N — ¥ {0, 1} be the the indicator function of the 
nonempty independent sets in a graph. We will revisit the task of SJ21 computing 

g(S)=^f(R), (13) 

ACS 

for all S C N. 

The computation proceeds in rounds i = 1, . . . , n. Initially, set go{S) = /(£*) 
for all S C N. Then we have, for i = 1, . . . , n, 

gi(S)= 9i -i(S) + [ieS}- gi -i(S\{i}) (SCN). (14) 

Finally, g(S) = g n (S). 

Proof of (fT4| . The intuition is that go(S), g n (S) approach g(S) 'coordinate- 
wise' by fixing fewer and fewer bits of S. To be precise, for i = 1, . . . , n, 

9l (S)= 5^[5n{i + l,...,»} = iin{» + l,...,n}]./(fl). (15) 

RCS 



Fig. 7. Yates's algorithm on the indicator function of the nonempty independent 
sets of the graph G. Arrows indicate how the value of gi(S) for i £ S is computed 
by adding c/i _ i ( S" \ {i}) to the 'previous' value gi-i(S). 

In particular, g n (S) = S_rcs/(^)- Correctness of (fH| is established by a 
straightforward but tedious induction argument for (|15l) . The base case go = / is 
immediate. For the inductive step, adopt the notation S(i) for S D {i + 1, . . . , n}. 
Then the right hand side of (|15p can be written as 



R(i)]f(R) 



RCS 



RCS 
i£R 



RCS 



If i S 1 then the first sum vanishes and the second sum simplifies to 



-£[S(i-l) = R(i-l)]f(R)=!H-i(S) 



RCS 



by induction. If i £ S then we can rewrite both sums to 



£ - !) = ^ - + E P(< - !) = ^ - !)] w 



ACS i?CS 



= ffi _ 1 (S) + 5i _ 1 (S\{i}) 



by induction. Finally, by (1141) the entire expression equals gi(S). 



□ 



Perspective. As before, our approach is basically dynamic programming for a 
decrease-and-conquer recurrence. The time and space requirements are within 
a linear factor of the ones given in However, the expression (JT3J) is more 
general and does not depend on the structure of independent sets. It applies 
to any function / : 2 N — > R from subsets to a ring, extending the algorithm to 
many other covering problems than graph colouring. 

Yates's algorithm has much in common with the fast Fourier transform. We 
can illustrate its operation using a butterfly-like network, see Fig. [5J 




9o 91 92 33 54 



Fig. 8. Yates's algorithm for the zeta transform. 



Here we used Yates's algorithm to compute (|T3|) . but the method is more 
general than that. For example, it computes the Mobius transform, see (|17l) 
below, and many others. A classical treatment of the algorithm appears in |13| . 
recent applications and modifications are in [7] and the forthcoming journal 
version of [3J. 



11 Mobius inversion. Let /: 2 N —> {0,1} be a function of subsets of N to 
{0, 1} (indeed, any ring would do). To connect to the graph colouring example 
from S}2] think of / as the indicator function of the nonempty independent sets 
in a graph. The zeta transform of / is the function (/£) : 2 N — » {0, 1} defined 
point-wise by 



(/ocn = J2 fw 

SCT 



(16) 



The brackets around (fC) & re usually omitted. The Mobius transform of / is the 
function (//i) : 2 N — > {0, 1} defined point-wise by 

(//i)(T)=X)(-l) inS| /(5), (17) 

SCT 

This allows us to state the principle of inclusion-exclusion in yet another 
way: 

/Ca* = M = f- (18) 

Proof. We show = /, the other argument is similar. 

/cMT) = E(- 1 ) ins| E/( fl ) 

SCT RCS 
S R 

= ^/(^)^CSCT](-l)WI 

fl s 

By ([1]), the inner sum equals [R — T], so the expression simplifies to f(T). □ 




Fig. 9. Mobius inversion. 



Perspective. For completeness, let us also derive (fTJ) from (fT5)) . to see that the 
two claims are equivalent. Consider two sets R and T. Define f(Q) = [Q = R]. 
Then, expanding (fT5)) . 

[i? = T] = f(T) = £ (-1)1^1 ^ /(Q) - E (- 1 )' nS| ^ S ] 

SCT QCS SCT 

= E (-i) ins| - 

flCSCT 

12 Covering by Mobius inversion. We now give alternative argument for the 
graph colouring expression ([2]). 




T x i — y x^ 




|x- n> x'' 




Fig. 10. Covering by Mobius inversion for k = 2 and k = 3. 



Let /: 2 N — > {0, 1} be the indicator function of the nonempty independent 
sets of a graph G = (N, E) . We want to count the number of ways so cover N 
with k nonempty independent sets. Define g(S) to be the number of ways to 
choose k nonempty independent sets whose union is S. Then we claim 

gC = (K) k ■ 

To see this, for every T C V, view g((T) and (f((T)) as two different ways 
of counting the number of ways so select k nonempty independent subsets of T. 
Now, by Mobius inversion (Tl5)) . we have 



.9 = (/C)V, 



which is the left hand side of @. In fact, we can now rewrite and understand 
the left hand side of ^ as 



\N\S\ 



SCN 




operation in the transformed domain 
function in the original domain 



Perspective. The fact that / was the indicator function of the independent sets 
played no role in this argument. It works for a many covering problems, and 
with some work also for packing and partitioning problems. 

Taxonomically, we can view inclusion-exclusion as a transform- and- conquer 
technique, like the Fourier transform. This can be expressed succinctly in terms 
of Mobius inversion, illustrated in Fig. [TU1 The zeta transform translates the 
original problem into a different domain, where the computation is often easier, 
and the Mobius transform translates back. In the covering example, the opera- 
tion in the transformed domain, exponentiation, is particularly simple. The idea 
extends to many other operations in the transformed domain, see [3]- 



Concluding remarks A comprehensive presentation of many of the ideas men- 
tioned here appears in a recent monograph [10J, with many additional examples. 
In particular a whole chapter is devoted to subset convolution, the most glaring 
omission from the present introduction. 

I owe much of my understanding of this topic to illuminating conversations 
with my collaborators, Andreas Bjorklund, Petteri Kaski, and Mikko Koivisto. 
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